智能论文笔记

Robust field-level inference with dark matter halos

Helen Shao , Francisco Villaescusa-Navarro , Pablo Villanueva-Domingo , Romain Teyssier , Lehman H. Garrison , Marco Gatti , Derek Inman , Yueying Ni , Ulrich P. Steinwandel , Mihir Kulkarni

分类：人工智能 | 机器学习

2022-09-14

我们将图形神经网络训练来自小工具N体模拟的光晕目录的神经网络，以执行宇宙学参数的无现场级别可能的推断。目录包含$ \ Lessim $ 5,000 HAROS带质量$ \ gtrsim 10^{10} 〜h^{ - 1} m_ \ odot $，定期卷为$（25〜H^{ - 1} {\ rm mpc}）{\ rm mpc}） ^3 $;目录中的每个光环都具有多种特性，例如位置，质量，速度，浓度和最大圆速度。我们的模型构建为置换，翻译和旋转的不变性，不施加最低限度的规模来提取信息，并能够以平均值来推断$ \ omega _ {\ rm m} $和$ \ sigma_8 $的值$ \ sim6 \％$的相对误差分别使用位置加上速度和位置加上质量。更重要的是，我们发现我们的模型非常强大：他们可以推断出使用数千个N-n-Body模拟的Halo目录进行测试时，使用五个不同的N-进行测试时，在使用Halo目录进行测试时，$ \ omega _ {\ rm m} $和$ \ sigma_8 $身体代码：算盘，Cubep $^3 $ M，Enzo，PKDGrav3和Ramses。令人惊讶的是，经过培训的模型推断$ \ omega _ {\ rm m} $在对数千个最先进的骆驼水力动力模拟进行测试时也可以使用，该模拟使用四个不同的代码和子网格物理实现。使用诸如浓度和最大循环速度之类的光环特性允许我们的模型提取更多信息，而牺牲了模型的鲁棒性。这可能会发生，因为不同的N体代码不会在与这些参数相对应的相关尺度上收敛。

translated by 谷歌翻译

Virgo: Scalable Unsupervised Classification of Cosmological Shock Waves

Max Lamparth , Ludwig Böss , Ulrich Steinwandel , Klaus Dolag

分类：机器学习

2022-08-14

宇宙学冲击波对于理解宇宙结构的形成至关重要。为了研究它们，科学家运行计算昂贵的高分辨率3D流体动力模拟。解释仿真结果是具有挑战性的，因为结果数据集很大，并且由于其复杂的形态和多个冲击战线相交，因此，冲击波表面很难分离和分类。我们介绍了一条新颖的管道，处女座，结合了身体动机，可伸缩性和概率的鲁棒性，以解决这一无监督的分类问题。为此，我们使用低级别矩阵近似值的内核主成分分析来贬低粒子的数据集并创建标记的子集。我们执行监督分类，以随机变分深内核学习恢复完整的数据分辨率。我们对三个具有不同复杂性的最先进数据集进行评估，并取得良好的结果。所提出的管道自动运行，只有几个超参数，并且在所有测试的数据集上表现良好。我们的结果对于大规模应用是有希望的，我们重点介绍了现在的科学工作。

translated by 谷歌翻译

Satellite galaxy abundance dependency on cosmology in Magneticum simulations

Antonio Ragagnin , Alessandra Fumagalli , Tiago Castro , Klaus Dolag , Alexandro Saro , Matteo Costanzi , Sebastian Bocquet

分类：机器学习

2021-10-11

上下文：建模星系簇中的卫星星系丰度$ n_s $是建模Halo职业分布（HOD）的关键要素，Halo职业分布（HOD）本身是将观察性研究与数值模拟连接的强大工具。目的：研究宇宙学参数对宇宙学和模拟观察中卫星丰度的影响。方法：我们构建一个基于宇宙参数的卫星丰度的模拟器（hodemu，\ url {https://github.com/aragagnin/hodemu/}），基于宇宙学参数$ \ omega_m，\ omega_m，\ omega_b，\ omega_b，\ sigma_8，\ sigma_8，h__0 $和redshift $ z。 $我们使用\磁性流体动力模拟训练我们的仿真器，这些模拟跨越15个不同的宇宙学，每个宇宙学超过$ 4 $ redshift切片$ 0 <z <z <0.5，$，对于每个设置，我们适合正常化$ a $ a $，log-slope $ \ beta $和Gaussian $ n_s-m $关系的分数划分$ \ sigma $。模拟器基于多变量输出高斯过程回归（GPR）。结果：我们发现$ a $ a和$ \ beta $取决于宇宙学参数，即使很虚弱，尤其是在$ \ omega_m，$ $ \ omega_b。$ $ （磁性，插图，巴哈马）。我们还表明，卫星丰度宇宙学的依赖性在全相物理（FP）模拟，仅暗（DMO）和非辐射模拟之间有所不同。结论：这项工作提供了对高质量光环的卫星丰度的宇宙学依赖性的初步校准，我们表明，使用宇宙学参数进行建模对于解释卫星丰度是必要的，我们表明了使用FP模拟在建模该依赖性方面的重要性。

translated by 谷歌翻译

Using Large Language Models to Generate Engaging Captions for Data Visualizations

Ashley Liew , Klaus Mueller

分类：自然语言处理 | 人工智能

2022-12-27

Creating compelling captions for data visualizations has been a longstanding challenge. Visualization researchers are typically untrained in journalistic reporting and hence the captions that are placed below data visualizations tend to be not overly engaging and rather just stick to basic observations about the data. In this work we explore the opportunities offered by the newly emerging crop of large language models (LLM) which use sophisticated deep learning technology to produce human-like prose. We ask, can these powerful software devices be purposed to produce engaging captions for generic data visualizations like a scatterplot. It turns out that the key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering. We report on first experiments using the popular LLM GPT-3 and deliver some promising results.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Combining multiple matchers for fingerprint verification: A case study in biosecure network of excellence

Fernando Alonso-Fernandez , Julian Fierrez-Aguilar , Hartwig Fronthaler , Klaus Kollreider , Javier Ortega-Garcia , Joaquin Gonzalez-Rodriguez , Josef Bigun

分类：计算机视觉

2022-12-04

We report on experiments for the fingerprint modality conducted during the First BioSecure Residential Workshop. Two reference systems for fingerprint verification have been tested together with two additional non-reference systems. These systems follow different approaches of fingerprint processing and are discussed in detail. Fusion experiments I volving different combinations of the available systems are presented. The experimental results show that the best recognition strategy involves both minutiae-based and correlation-based measurements. Regarding the fusion experiments, the best relative improvement is obtained when fusing systems that are based on heterogeneous strategies for feature extraction and/or matching. The best combinations of two/three/four systems always include the best individual systems whereas the best verification performance is obtained when combining all the available systems.

translated by 谷歌翻译

MONAI: An open-source framework for deep learning in healthcare

M. Jorge Cardoso , Wenqi Li , Richard Brown , Nic Ma , Eric Kerfoot , Yiheng Wang , Benjamin Murrey , Andriy Myronenko , Can Zhao , Dong Yang

分类：机器学习 | 人工智能 | 计算机视觉

2022-11-04

Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.

translated by 谷歌翻译

Bayesian Neural Network Versus Ex-Post Calibration For Prediction Uncertainty

Satya Borgohain , Klaus Ackermann , Ruben Loaiza-Maya

分类：机器学习 | 人工智能 | (统计)机器学习

2022-09-29

在许多现实世界和高影响力决策设置中，从分类过程中说明预测性不确定性的神经网络的概率预测至关重要。但是，实际上，大多数数据集经过非稳定神经网络的培训，默认情况下，这些神经网络不会捕获这种固有的不确定性。这个众所周知的问题导致了事后校准程序的开发，例如PLATT缩放（Logistic），等渗和β校准，这将得分转化为校准良好的经验概率。校准方法的合理替代方法是使用贝叶斯神经网络，该网络直接建模预测分布。尽管它们已应用于图像和文本数据集，但在表格和小型数据制度中的采用有限。在本文中，我们证明了与校准神经网络相比，贝叶斯神经网络在各种数据集中进行实验，从而产生竞争性能。

translated by 谷歌翻译

Active Transfer Prototypical Network: An Efficient Labeling Algorithm for Time-Series Data

Yuqicheng Zhu , Mohamed-Ali Tnani , Timo Jahnz , Klaus Diepold

分类：机器学习

2022-09-28

在汽车行业中，标记数据的匮乏是典型的挑战。注释的时间序列测量需要固体域知识和深入的探索性数据分析，这意味着高标签工作。传统的主动学习（AL）通过根据估计的分类概率积极查询最有用的实例来解决此问题，并在迭代中重新审视该模型。但是，学习效率强烈依赖于初始模型，从而导致初始数据集和查询编号的大小之间的权衡。本文提出了一个新颖的几杆学习（FSL）基于AL框架，该框架通过将原型网络（Protonet）纳入AL迭代来解决权衡问题。一方面，结果表明了对初始模型的鲁棒性，另一方面，通过在每种迭代中的支持设置的主动选择方面的学习效率。该框架已在UCI HAR/HAPT数据集和现实世界制动操纵数据集上进行了验证。学习绩效在两个数据集上都显着超过了传统的AL算法，分别以10％和5％的标签工作实现了90％的分类精度。

translated by 谷歌翻译

Large-Sample Properties of Non-Stationary Source Separation for Gaussian Signals

François Bachoc , Christoph Muehlmann , Klaus Nordhausen , Joni Virta

分类： (统计)机器学习

2022-09-21

非平稳来源分离是具有许多不同方法的盲源分离的一个完善的分支。但是，对于这些方法都没有大样本结果可用。为了弥合这一差距，我们开发了NSS-JD的大样本理论，NSS-JD是一种基于块构成协方差矩阵的联合对角线化的非平稳源分离方法。我们在独立高斯非平稳源信号的瞬时线性混合模型以及一组非常通用的假设下工作：除了有界条件外，我们做出的唯一假设是，源表现出有限的依赖性，其方差函数足够差异，足以差异为渐近可分离。在以前的条件下显示，未混合估计器及其在标准平方根速率下的限制高斯分布的一致性显示。模拟实验用于验证理论结果并研究块长度对分离的影响。

translated by 谷歌翻译